Distributed Vector Architecture : Beyond a Single Vector - IRAM
نویسندگان
چکیده
integration of memory on the same die as the processor (IRAM) has the potential to offer unprecedented bandwidth that can be exploited efficiently by vector processors. However, real-world scientific vector applications with their very large memory requirements and their poor locality, would easily overflow any single IRAM device. In this environment, traditional approaches such as caching or paging generate considerable traffic, diminishing the performance advantage of processor-memory integration. To exploit the full potential of IRAM in the realm of large-scale scientific computing , we propose a DIstributed Vector Architecture (DIVA), that uses multiple vector-capable IRAM nodes in a distributed shared-memory configuration. The advantages of our approach are twofold: (i) we speed up the execution of the vector instructions by parallelizing them across the nodes, (ii) we reduce external traffic, by bringing computation to data rather than data to computation. We dynamically map the computation of individual vector instructions on nodes to coincide, to the extent possible, with the corresponding data in memory. As an implementation, we propose a mechanism to assign at run-time elements of the architectural vector registers on nodes, using the layout of data in memory as a blueprint. Using traces of vector supercomputer programs we demonstrate that DIVA often generates considerably less external traffic compared to single or multiple-node alternatives that are based solely on caching or paging. Considerable performance gains are then possible because of DIVA's inter-node parallelism.
منابع مشابه
A Media-Enhanced Vector Architecture for Embedded Memory Systems
Next generation portable devices will require processors with both low energy consumption and high performance for media functions. At the same time, modern CMOS technology creates the need for highly scalable VLSI architectures. Conventional processor architectures fail to meet these requirements. This paper presents the architecture of Vector IRAM (VIRAM), a processor that combines vector pro...
متن کاملDistributed vector architectures
Integrating processors and main memory is a promising approach to increase system performance. Such integration provides very high memory bandwidth that can be exploited efficiently by vector operations. However, traditional vector applications would easily overflow the limited memory of a single integrated node. To accommodate such workloads, we propose the DIstributed Vector Architecture (DIV...
متن کاملFor Embedded Applications with Data-level Parallelism, a Vector Processor Offers High Performance at Low Power Consumption and Low Design Complexity. unlike Superscalar and Vliw Designs, a Vector Processor Is Scalable and Can Optimally Match Specific
Designers of embedded processors have typically optimized for low power consumption and low design complexity to minimize cost. Performance was a secondary consideration. Nowadays, many embedded systems (set-top boxes, game consoles, personal digital assistants, and cell phones) commonly perform computation-intensive media tasks such as video processing, speech transcoding, graphics, and high-b...
متن کاملHardware/Compiler Co-development for an Embedded Media Processor
Embedded and portable systems running multimedia applications create a new challenge for hardware architects. The microprocessor needed for such systems is a merged general-purpose processor and digital-signal processor, with the programmability the former and the performance and power budget of the latter. This paper presents the co-development of the instruction set, the hardware, and the com...
متن کاملImage Segmentation on IRAM
The Computer Vision group at U.C. Berkeley recently developed a novel approach to image segmentation, called the Normalized Cuts algorithm. The current implementation of the algorithm has an execution time on the order of minutes for medium-sized images running on conventional scalar machines. This paper explores the current bottlenecks and seeks to maximize the performance by porting the algor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997